Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
PLoS One ; 16(10): e0258623, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34653224

RESUMO

Biomedical and life science literature is an essential way to publish experimental results. With the rapid growth of the number of new publications, the amount of scientific knowledge represented in free text is increasing remarkably. There has been much interest in developing techniques that can extract this knowledge and make it accessible to aid scientists in discovering new relationships between biological entities and answering biological questions. Making use of the word2vec approach, we generated word vector representations based on a corpus consisting of over 16 million PubMed abstracts. We developed a text mining pipeline to produce word2vec embeddings with different properties and performed validation experiments to assess their utility for biomedical analysis. An important pre-processing step consisted in the substitution of synonymous terms by their preferred terms in biomedical databases. Furthermore, we extracted gene-gene networks from two embedding versions and used them as prior knowledge to train Graph-Convolutional Neural Networks (CNNs) on large breast cancer gene expression data and on other cancer datasets. Performances of resulting models were compared to Graph-CNNs trained with protein-protein interaction (PPI) networks or with networks derived using other word embedding algorithms. We also assessed the effect of corpus size on the variability of word representations. Finally, we created a web service with a graphical and a RESTful interface to extract and explore relations between biomedical terms using annotated embeddings. Comparisons to biological databases showed that relations between entities such as known PPIs, signaling pathways and cellular functions, or narrower disease ontology groups correlated with higher cosine similarity. Graph-CNNs trained with word2vec-embedding-derived networks performed sufficiently good for the metastatic event prediction tasks compared to other networks. Such performance was good enough to validate the utility of our generated word embeddings in constructing biological networks. Word representations as produced by text mining algorithms like word2vec, therefore are able to capture biologically meaningful relations between entities. Our generated embeddings are publicly available at https://github.com/genexplain/Word2vec-based-Networks/blob/main/README.md.


Assuntos
Neoplasias da Mama/genética , Biologia Computacional/métodos , Mineração de Dados/métodos , Algoritmos , Neoplasias da Mama/metabolismo , Bases de Dados Factuais , Feminino , Regulação Neoplásica da Expressão Gênica , Humanos , Aprendizado de Máquina , Redes Neurais de Computação , Mapas de Interação de Proteínas , Terminologia como Assunto
2.
Genome Med ; 13(1): 42, 2021 03 11.
Artigo em Inglês | MEDLINE | ID: mdl-33706810

RESUMO

BACKGROUND: Contemporary deep learning approaches show cutting-edge performance in a variety of complex prediction tasks. Nonetheless, the application of deep learning in healthcare remains limited since deep learning methods are often considered as non-interpretable black-box models. However, the machine learning community made recent elaborations on interpretability methods explaining data point-specific decisions of deep learning techniques. We believe that such explanations can assist the need in personalized precision medicine decisions via explaining patient-specific predictions. METHODS: Layer-wise Relevance Propagation (LRP) is a technique to explain decisions of deep learning methods. It is widely used to interpret Convolutional Neural Networks (CNNs) applied on image data. Recently, CNNs started to extend towards non-Euclidean domains like graphs. Molecular networks are commonly represented as graphs detailing interactions between molecules. Gene expression data can be assigned to the vertices of these graphs. In other words, gene expression data can be structured by utilizing molecular network information as prior knowledge. Graph-CNNs can be applied to structured gene expression data, for example, to predict metastatic events in breast cancer. Therefore, there is a need for explanations showing which part of a molecular network is relevant for predicting an event, e.g., distant metastasis in cancer, for each individual patient. RESULTS: We extended the procedure of LRP to make it available for Graph-CNN and tested its applicability on a large breast cancer dataset. We present Graph Layer-wise Relevance Propagation (GLRP) as a new method to explain the decisions made by Graph-CNNs. We demonstrate a sanity check of the developed GLRP on a hand-written digits dataset and then apply the method on gene expression data. We show that GLRP provides patient-specific molecular subnetworks that largely agree with clinical knowledge and identify common as well as novel, and potentially druggable, drivers of tumor progression. CONCLUSIONS: The developed method could be potentially highly useful on interpreting classification results in the context of different omics data and prior knowledge molecular networks on the individual patient level, as for example in precision medicine approaches or a molecular tumor board.


Assuntos
Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Redes Reguladoras de Genes , Redes Neurais de Computação , Algoritmos , Feminino , Regulação Neoplásica da Expressão Gênica , Humanos , Metástase Neoplásica , Mapas de Interação de Proteínas/genética , Transdução de Sinais/genética
3.
Dis Model Mech ; 13(11)2020 11 27.
Artigo em Inglês | MEDLINE | ID: mdl-32958515

RESUMO

Inflammatory bowel diseases (IBDs) cause significant morbidity and mortality. Aberrant NF-κB signalling is strongly associated with these conditions, and several established drugs influence the NF-κB signalling network to exert their effect. This study aimed to identify drugs that alter NF-κB signalling and could be repositioned for use in IBD. The SysmedIBD Consortium established a novel drug-repurposing pipeline based on a combination of in silico drug discovery and biological assays targeted at demonstrating an impact on NF-κB signalling, and a murine model of IBD. The drug discovery algorithm identified several drugs already established in IBD, including corticosteroids. The highest-ranked drug was the macrolide antibiotic clarithromycin, which has previously been reported to have anti-inflammatory effects in aseptic conditions. The effects of clarithromycin effects were validated in several experiments: it influenced NF-κB-mediated transcription in murine peritoneal macrophages and intestinal enteroids; it suppressed NF-κB protein shuttling in murine reporter enteroids; it suppressed NF-κB (p65) DNA binding in the small intestine of mice exposed to lipopolysaccharide; and it reduced the severity of dextran sulphate sodium-induced colitis in C57BL/6 mice. Clarithromycin also suppressed NF-κB (p65) nuclear translocation in human intestinal enteroids. These findings demonstrate that in silico drug repositioning algorithms can viably be allied to laboratory validation assays in the context of IBD, and that further clinical assessment of clarithromycin in the management of IBD is required.This article has an associated First Person interview with the joint first authors of the paper.


Assuntos
Reposicionamento de Medicamentos , Doenças Inflamatórias Intestinais/tratamento farmacológico , Doenças Inflamatórias Intestinais/patologia , Análise de Sistemas , Animais , Células Cultivadas , Claritromicina/farmacologia , Claritromicina/uso terapêutico , Colite/induzido quimicamente , Colite/metabolismo , Colite/patologia , DNA/metabolismo , Sulfato de Dextrana , Redes Reguladoras de Genes , Humanos , Doenças Inflamatórias Intestinais/metabolismo , Lipopolissacarídeos , Luciferases/metabolismo , Camundongos Endogâmicos C57BL , NF-kappa B/metabolismo , Organoides/efeitos dos fármacos , Organoides/metabolismo , Ligação Proteica/efeitos dos fármacos , Transdução de Sinais , Fatores de Transcrição/metabolismo , Fator de Necrose Tumoral alfa/metabolismo
4.
Int J Mol Sci ; 21(8)2020 Apr 14.
Artigo em Inglês | MEDLINE | ID: mdl-32295185

RESUMO

Accumulation of lipid-laden (foam) cells in the arterial wall is known to be the earliest step in the pathogenesis of atherosclerosis. There is almost no doubt that atherogenic modified low-density lipoproteins (LDL) are the main sources of accumulating lipids in foam cells. Atherogenic modified LDL are taken up by arterial cells, such as macrophages, pericytes, and smooth muscle cells in an unregulated manner bypassing the LDL receptor. The present study was conducted to reveal possible common mechanisms in the interaction of macrophages with associates of modified LDL and non-lipid latex particles of a similar size. To determine regulatory pathways that are potentially responsible for cholesterol accumulation in human macrophages after the exposure to naturally occurring atherogenic or artificially modified LDL, we used transcriptome analysis. Previous studies of our group demonstrated that any type of LDL modification facilitates the self-association of lipoprotein particles. The size of such self-associates hinders their interaction with a specific LDL receptor. As a result, self-associates are taken up by nonspecific phagocytosis bypassing the LDL receptor. That is why we used latex beads as a stimulator of macrophage phagocytotic activity. We revealed at least 12 signaling pathways that were regulated by the interaction of macrophages with the multiple-modified atherogenic naturally occurring LDL and with latex beads in a similar manner. Therefore, modified LDL was shown to stimulate phagocytosis through the upregulation of certain genes. We have identified at least three genes (F2RL1, EIF2AK3, and IL15) encoding inflammatory molecules and associated with signaling pathways that were upregulated in response to the interaction of modified LDL with macrophages. Knockdown of two of these genes, EIF2AK3 and IL15, completely suppressed cholesterol accumulation in macrophages. Correspondingly, the upregulation of EIF2AK3 and IL15 promoted cholesterol accumulation. These data confirmed our hypothesis of the following chain of events in atherosclerosis: LDL particles undergo atherogenic modification; this is accompanied by the formation of self-associates; large LDL associates stimulate phagocytosis; as a result of phagocytosis stimulation, pro-inflammatory molecules are secreted; these molecules cause or at least contribute to the accumulation of intracellular cholesterol. This chain of events may explain the relationship between cholesterol accumulation and inflammation. The primary sequence of events in this chain is related to inflammatory response rather than cholesterol accumulation.


Assuntos
Colesterol/metabolismo , Células Espumosas/metabolismo , Metabolismo dos Lipídeos , Transdução de Sinais , Biomarcadores , Suscetibilidade a Doenças , Células Espumosas/patologia , Perfilação da Expressão Gênica , Humanos , Inflamação/etiologia , Inflamação/metabolismo , Inflamação/patologia , Mediadores da Inflamação/metabolismo , Macrófagos/metabolismo , Macrófagos/patologia , Modelos Biológicos
5.
Int J Mol Sci ; 21(3)2020 Jan 27.
Artigo em Inglês | MEDLINE | ID: mdl-32012706

RESUMO

Excessive accumulation of lipid inclusions in the arterial wall cells (foam cell formation) caused by modified low-density lipoprotein (LDL) is the earliest and most noticeable manifestation of atherosclerosis. The mechanisms of foam cell formation are not fully understood and can involve altered lipid uptake, impaired lipid metabolism, or both. Recently, we have identified the top 10 master regulators that were involved in the accumulation of cholesterol in cultured macrophages induced by the incubation with modified LDL. It was found that most of the identified master regulators were related to the regulation of the inflammatory immune response, but not to lipid metabolism. A possible explanation for this unexpected result is a stimulation of the phagocytic activity of macrophages by modified LDL particle associates that have a relatively large size. In the current study, we investigated gene regulation in macrophages using transcriptome analysis to test the hypothesis that the primary event occurring upon the interaction of modified LDL and macrophages is the stimulation of phagocytosis, which subsequently triggers the pro-inflammatory immune response. We identified genes that were up- or downregulated following the exposure of cultured cells to modified LDL or latex beads (inert phagocytosis stimulators). Most of the identified master regulators were involved in the innate immune response, and some of them were encoding major pro-inflammatory proteins. The obtained results indicated that pro-inflammatory response to phagocytosis stimulation precedes the accumulation of intracellular lipids and possibly contributes to the formation of foam cells. In this way, the currently recognized hypothesis that the accumulation of lipids triggers the pro-inflammatory response was not confirmed. Comparative analysis of master regulators revealed similarities in the genetic regulation of the interaction of macrophages with naturally occurring LDL and desialylated LDL. Oxidized and desialylated LDL affected a different spectrum of genes than naturally occurring LDL. These observations suggest that desialylation is the most important modification of LDL occurring in vivo. Thus, modified LDL caused the gene regulation characteristic of the stimulation of phagocytosis. Additionally, the knock-down effect of five master regulators, such as IL15, EIF2AK3, F2RL1, TSPYL2, and ANXA1, on intracellular lipid accumulation was tested. We knocked down these genes in primary macrophages derived from human monocytes. The addition of atherogenic naturally occurring LDL caused a significant accumulation of cholesterol in the control cells. The knock-down of the EIF2AK3 and IL15 genes completely prevented cholesterol accumulation in cultured macrophages. The knock-down of the ANXA1 gene caused a further decrease in cholesterol content in cultured macrophages. At the same time, knock-down of F2RL1 and TSPYL2 did not cause an effect. The results obtained allowed us to explain in which way the inflammatory response and the accumulation of cholesterol are related confirming our hypothesis of atherogenesis development based on the following viewpoints: LDL particles undergo atherogenic modifications that, in turn, accompanied by the formation of self-associates; large LDL associates stimulate phagocytosis; as a result of phagocytosis stimulation, pro-inflammatory molecules are secreted; these molecules cause or at least contribute to the accumulation of intracellular cholesterol. Therefore, it became obvious that the primary event in this sequence is not the accumulation of cholesterol but an inflammatory response.


Assuntos
Células Espumosas/metabolismo , Células Espumosas/patologia , Lipoproteínas LDL/metabolismo , Fagocitose , Biomarcadores , Células Espumosas/imunologia , Perfilação da Expressão Gênica , Técnicas de Silenciamento de Genes , Humanos , Imunidade Inata , Metabolismo dos Lipídeos , Macrófagos/imunologia , Macrófagos/metabolismo , Monócitos/imunologia , Monócitos/metabolismo , Oxirredução , Fagocitose/genética , Fagocitose/imunologia , Transdução de Sinais , Transcriptoma
6.
BMC Bioinformatics ; 20(Suppl 4): 119, 2019 Apr 18.
Artigo em Inglês | MEDLINE | ID: mdl-30999858

RESUMO

BACKGROUND: The search for molecular biomarkers of early-onset colorectal cancer (CRC) is an important but still quite challenging and unsolved task. Detection of CpG methylation in human DNA obtained from blood or stool has been proposed as a promising approach to a noninvasive early diagnosis of CRC. Thousands of abnormally methylated CpG positions in CRC genomes are often located in non-coding parts of genes. Novel bioinformatic methods are thus urgently needed for multi-omics data analysis to reveal causative biomarkers with a potential driver role in early stages of cancer. METHODS: We have developed a method for finding potential causal relationships between epigenetic changes (DNA methylations) in gene regulatory regions that affect transcription factor binding sites (TFBS) and gene expression changes. This method also considers the topology of the involved signal transduction pathways and searches for positive feedback loops that may cause the carcinogenic aberrations in gene expression. We call this method "Walking pathways", since it searches for potential rewiring mechanisms in cancer pathways due to dynamic changes in the DNA methylation status of important gene regulatory regions ("epigenomic walking"). RESULTS: In this paper, we analysed an extensive collection of full genome gene-expression data (RNA-seq) and DNA methylation data of genomic CpG islands (using Illumina methylation arrays) generated from a sample of tumor and normal gut epithelial tissues of 300 patients with colorectal cancer (at different stages of the disease) (data generated in the EU-supported SysCol project). Identification of potential epigenetic biomarkers of DNA methylation was performed using the fully automatic multi-omics analysis web service "My Genome Enhancer" (MGE) (my-genome-enhancer.com). MGE uses the database on gene regulation TRANSFAC®, the signal transduction pathways database TRANSPATH®, and software that employs AI (artificial intelligence) methods for the analysis of cancer-specific enhancers. CONCLUSIONS: The identified biomarkers underwent experimental testing on an independent set of blood samples from patients with colorectal cancer. As a result, using advanced methods of statistics and machine learning, a minimum set of 6 biomarkers was selected, which together achieve the best cancer detection potential. The markers include hypermethylated positions in regulatory regions of the following genes: CALCA, ENO1, MYC, PDX1, TCF7, ZNF43.


Assuntos
Biomarcadores Tumorais/genética , Neoplasias Colorretais/genética , Metilação de DNA/genética , Retroalimentação Fisiológica , Transdução de Sinais/genética , Sítios de Ligação/genética , Neoplasias Colorretais/diagnóstico , Neoplasias Colorretais/patologia , Ilhas de CpG/genética , Epigênese Genética , Feminino , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Humanos , Masculino , Pessoa de Meia-Idade , Estadiamento de Neoplasias , Fatores de Transcrição/metabolismo
7.
Epigenomics ; 10(8): 1103-1119, 2018 08.
Artigo em Inglês | MEDLINE | ID: mdl-30070582

RESUMO

AIM: To integrate transcriptomic and DNA-methylomic measurements on varicose versus normal veins using a systems biological analysis to shed light on the interplay between genetic and epigenetic factors. MATERIALS & METHODS: Differential expression and methylation were measured using microarrays, supported by real-time quantitative PCR and immunohistochemistry confirmation for relevant gene products. A systems biological 'upstream analysis' was further applied. RESULTS: We identified several potential key players contributing to extracellular matrix remodeling in varicose veins. Specifically, our analysis suggests MFAP5 acting as a master regulator, upstream of integrins, of the cellular network affecting the varicose vein condition. Possible mechanism and pathogenic model were outlined. CONCLUSION: A coherent model proposed incorporates the relevant signaling networks and will hopefully aid further studies on varicose vein pathogenesis.


Assuntos
Proteínas Contráteis/genética , Matriz Extracelular , Glicoproteínas/genética , Varizes/genética , Adulto , Metilação de DNA , Feminino , Perfilação da Expressão Gênica , Humanos , Peptídeos e Proteínas de Sinalização Intercelular , Masculino , Pessoa de Meia-Idade , Veia Safena
8.
BMC Med Genomics ; 11(Suppl 1): 12, 2018 02 13.
Artigo em Inglês | MEDLINE | ID: mdl-29504919

RESUMO

BACKGROUND: Small molecule Nutlin-3 reactivates p53 in cancer cells by interacting with the complex between p53 and its repressor Mdm-2 and causing an increase in cancer cell apoptosis. Therefore, Nutlin-3 has potent anticancer properties. Clinical and experimental studies of Nutlin-3 showed that some cancer cells may lose sensitivity to this compound. Here we analyze possible mechanisms for insensitivity of cancer cells to Nutlin-3. METHODS: We applied upstream analysis approach implemented in geneXplain platform ( genexplain.com ) using TRANSFAC® database of transcription factors and their binding sites in genome and using TRANSPATH® database of signal transduction network with associated software such as Match™ and Composite Module Analyst (CMA). RESULTS: Using genome-wide gene expression profiling we compared several lung cancer cell lines and showed that expression programs executed in Nutlin-3 insensitive cell lines significantly differ from that of Nutlin-3 sensitive cell lines. Using artificial intelligence approach embed in CMA software, we identified a set of transcription factors cooperatively binding to the promoters of genes up-regulated in the Nutlin-3 insensitive cell lines. Graph analysis of signal transduction network upstream of these transcription factors allowed us to identify potential master-regulators responsible for maintaining such low sensitivity to Nutlin-3 with the most promising candidate mTOR, which acts in the context of activated PI3K pathway. These finding were validated experimentally using an array of chemical inhibitors. CONCLUSIONS: We showed that the Nutlin-3 insensitive cell lines are actually highly sensitive to the dual PI3K/mTOR inhibitor NVP-BEZ235, while no responding to either PI3K -specific LY294002 nor Bcl-XL specific 2,3-DCPE compounds.


Assuntos
Resistencia a Medicamentos Antineoplásicos , Imidazóis/farmacologia , Neoplasias Pulmonares/patologia , Fosfatidilinositol 3-Quinases/metabolismo , Piperazinas/farmacologia , Inibidores de Proteínas Quinases/farmacologia , Serina-Treonina Quinases TOR/metabolismo , Proteína Supressora de Tumor p53/metabolismo , Apoptose , Proliferação de Células , Humanos , Neoplasias Pulmonares/tratamento farmacológico , Neoplasias Pulmonares/metabolismo , Fosfatidilinositol 3-Quinases/genética , Transdução de Sinais , Serina-Treonina Quinases TOR/genética , Células Tumorais Cultivadas , Proteína Supressora de Tumor p53/genética
9.
EuPA Open Proteom ; 13: 1-13, 2016 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-29900117

RESUMO

We present an "upstream analysis" strategy for causal analysis of multiple "-omics" data. It analyzes promoters using the TRANSFAC database, combines it with an analysis of the upstream signal transduction pathways and identifies master regulators as potential drug targets for a pathological process. We applied this approach to a complex multi-omics data set that contains transcriptomics, proteomics and epigenomics data. We identified the following potential drug targets against induced resistance of cancer cells towards chemotherapy by methotrexate (MTX): TGFalpha, IGFBP7, alpha9-integrin, and the following chemical compounds: zardaverine and divalproex as well as human metabolites such as nicotinamide N-oxide.

10.
Microarrays (Basel) ; 4(2): 270-86, 2015 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-27600225

RESUMO

A strategy is presented that allows a causal analysis of co-expressed genes, which may be subject to common regulatory influences. A state-of-the-art promoter analysis for potential transcription factor (TF) binding sites in combination with a knowledge-based analysis of the upstream pathway that control the activity of these TFs is shown to lead to hypothetical master regulators. This strategy was implemented as a workflow in a comprehensive bioinformatic software platform. We applied this workflow to gene sets that were identified by a novel triclustering algorithm in naphthalene-induced gene expression signatures of murine liver and lung tissue. As a result, tissue-specific master regulators were identified that are known to be linked with tumorigenic and apoptotic processes. To our knowledge, this is the first time that genes of expression triclusters were used to identify upstream regulators.

11.
PLoS Comput Biol ; 9(3): e1002958, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23555204

RESUMO

Algorithmic comparison of DNA sequence motifs is a problem in bioinformatics that has received increased attention during the last years. Its main applications concern characterization of potentially novel motifs and clustering of a motif collection in order to remove redundancy. Despite growing interest in motif clustering, the question which motif clusters to aim at has so far not been systematically addressed. Here we analyzed motif similarities in a comprehensive set of vertebrate transcription factor classes. For this we developed enhanced similarity scores by inclusion of the information coverage (IC) criterion, which evaluates the fraction of information an alignment covers in aligned motifs. A network-based method enabled us to identify motif clusters with high correspondence to DNA-binding domain phylogenies and prior experimental findings. Based on this analysis we derived a set of motif families representing distinct binding specificities. These motif families were used to train a classifier which was further integrated into a novel algorithm for unsupervised motif clustering. Application of the new algorithm demonstrated its superiority to previously published methods and its ability to reproduce entrained motif families. As a result, our work proposes a probabilistic approach to decide whether two motifs represent common or distinct binding specificities.


Assuntos
Biologia Computacional/métodos , Motivos de Nucleotídeos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Análise por Conglomerados , DNA/genética , DNA/metabolismo , Bases de Dados Genéticas , Redes Reguladoras de Genes , Modelos Logísticos , Filogenia , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
12.
PLoS One ; 6(3): e17738, 2011 Mar 28.
Artigo em Inglês | MEDLINE | ID: mdl-21464922

RESUMO

The molecular causes by which the epidermal growth factor receptor tyrosine kinase induces malignant transformation are largely unknown. To better understand EGFs' transforming capacity whole genome scans were applied to a transgenic mouse model of liver cancer and subjected to advanced methods of computational analysis to construct de novo gene regulatory networks based on a combination of sequence analysis and entrained graph-topological algorithms. Here we identified transcription factors, processes, key nodes and molecules to connect as yet unknown interacting partners at the level of protein-DNA interaction. Many of those could be confirmed by electromobility band shift assay at recognition sites of gene specific promoters and by western blotting of nuclear proteins. A novel cellular regulatory circuitry could therefore be proposed that connects cell cycle regulated genes with components of the EGF signaling pathway. Promoter analysis of differentially expressed genes suggested the majority of regulated transcription factors to display specificity to either the pre-tumor or the tumor state. Subsequent search for signal transduction key nodes upstream of the identified transcription factors and their targets suggested the insulin-like growth factor pathway to render the tumor cells independent of EGF receptor activity. Notably, expression of IGF2 in addition to many components of this pathway was highly upregulated in tumors. Together, we propose a switch in autocrine signaling to foster tumor growth that was initially triggered by EGF and demonstrate the knowledge gain form promoter analysis combined with upstream key node identification.


Assuntos
Biologia Computacional/métodos , Fator de Crescimento Epidérmico/metabolismo , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/patologia , Lesões Pré-Cancerosas/genética , Lesões Pré-Cancerosas/patologia , Animais , Sítios de Ligação , Biomarcadores Tumorais/genética , Biomarcadores Tumorais/metabolismo , Ciclo Celular/genética , Análise por Conglomerados , DNA de Neoplasias/metabolismo , Modelos Animais de Doenças , Regulação para Baixo/genética , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes/genética , Genes Neoplásicos/genética , Metabolismo dos Lipídeos/genética , Camundongos , Camundongos Transgênicos , Regiões Promotoras Genéticas/genética , Ligação Proteica , Transdução de Sinais/genética , Fatores de Transcrição/metabolismo , Regulação para Cima/genética
13.
BMC Syst Biol ; 4: 124, 2010 Sep 06.
Artigo em Inglês | MEDLINE | ID: mdl-20815942

RESUMO

BACKGROUND: The study of relationships between human diseases provides new possibilities for biomedical research. Recent achievements on human genetic diseases have stimulated interest to derive methods to identify disease associations in order to gain further insight into the network of human diseases and to predict disease genes. RESULTS: Using about 10000 manually collected causal disease/gene associations, we developed a statistical approach to infer meaningful associations between human morbidities. The derived method clustered cardiometabolic and endocrine disorders, immune system-related diseases, solid tissue neoplasms and neurodegenerative pathologies into prominent disease groups. Analysis of biological functions confirmed characteristic features of corresponding disease clusters. Inference of disease associations was further employed as a starting point for prediction of disease genes. Efforts were made to underpin the validity of results by relevant literature evidence. Interestingly, many inferred disease relationships correspond to known clinical associations and comorbidities, and several predicted disease genes were subjects of therapeutic target research. CONCLUSIONS: Causal molecular mechanisms present a unifying principle to derive methods for disease classification, analysis of clinical disorder associations, and prediction of disease genes. According to the definition of causal disease genes applied in this study, these results are not restricted to genetic disease/gene relationships. This may be particularly useful for the study of long-term or chronic illnesses, where pathological derangement due to environmental or as part of sequel conditions is of importance and may not be fully explained by genetic background.


Assuntos
Biologia Computacional/métodos , Doença/genética , Humanos , Anotação de Sequência Molecular , Reprodutibilidade dos Testes
14.
BMC Bioinformatics ; 11: 225, 2010 May 03.
Artigo em Inglês | MEDLINE | ID: mdl-20438625

RESUMO

BACKGROUND: Knowledge of transcription factor-DNA binding patterns is crucial for understanding gene transcription. Numerous DNA-binding proteins are annotated as transcription factors in the literature, however, for many of them the corresponding DNA-binding motifs remain uncharacterized. RESULTS: The position weight matrices (PWMs) of transcription factors from different structural classes have been determined using a knowledge-based statistical potential. The scoring function calibrated against crystallographic data on protein-DNA contacts recovered PWMs of various members of widely studied transcription factor families such as p53 and NF-kappaB. Where it was possible, extensive comparison to experimental binding affinity data and other physical models was made. Although the p50p50, p50RelB, and p50p65 dimers belong to the same family, particular differences in their PWMs were detected, thereby suggesting possibly different in vivo binding modes. The PWMs of p63 and p73 were computed on the basis of homology modeling and their performance was studied using upstream sequences of 85 p53/p73-regulated human genes. Interestingly, about half of the p63 and p73 hits reported by the Match algorithm in the altogether 126 promoters lay more than 2 kb upstream of the corresponding transcription start sites, which deviates from the common assumption that most regulatory sites are located more proximal to the TSS. The fact that in most of the cases the binding sites of p63 and p73 did not overlap with the p53 sites suggests that p63 and p73 could influence the p53 transcriptional activity cooperatively. The newly computed p50p50 PWM recovered 5 more experimental binding sites than the corresponding TRANSFAC matrix, while both PWMs showed comparable receiver operator characteristics. CONCLUSIONS: A novel algorithm was developed to calculate position weight matrices from protein-DNA complex structures. The proposed algorithm was extensively validated against experimental data. The method was further combined with Homology Modeling to obtain PWMs of factors for which crystallographic complexes with DNA are not yet available. The performance of PWMs obtained in this work in comparison to traditionally constructed matrices demonstrates that the structure-based approach presents a promising alternative to experimental determination of transcription factor binding properties.


Assuntos
Biologia Computacional/métodos , Proteínas de Ligação a DNA/química , Fatores de Transcrição/química , Algoritmos , Sítios de Ligação , Termodinâmica
15.
BMC Genomics ; 8: 378, 2007 Oct 18.
Artigo em Inglês | MEDLINE | ID: mdl-17945028

RESUMO

BACKGROUND: A substantial fraction of non-coding DNA sequences of multicellular eukaryotes is under selective constraint. In particular, approximately 5% of the human genome consists of conserved non-coding sequences (CNSs). CNSs differ from other genomic sequences in their nucleotide composition and must play important functional roles, which mostly remain obscure. RESULTS: We investigated relative abundances of short sequence motifs in all human CNSs present in the human/mouse whole-genome alignments vs. three background sets of sequences: (i) weakly conserved or unconserved non-coding sequences (non-CNSs); (ii) near-promoter sequences (located between nucleotides -500 and -1500, relative to a start of transcription); and (iii) random sequences with the same nucleotide composition as that of CNSs. When compared to non-CNSs and near-promoter sequences, CNSs possess an excess of AT-rich motifs, often containing runs of identical nucleotides. In contrast, when compared to random sequences, CNSs contain an excess of GC-rich motifs which, however, lack CpG dinucleotides. Thus, abundance of short sequence motifs in human CNSs, taken as a whole, is mostly determined by their overall compositional properties and not by overrepresentation of any specific short motifs. These properties are: (i) high AT-content of CNSs, (ii) a tendency, probably due to context-dependent mutation, of A's and T's to clump, (iii) presence of short GC-rich regions, and (iv) avoidance of CpG contexts, due to their hypermutability. Only a small number of short motifs, overrepresented in all human CNSs are similar to binding sites of transcription factors from the FOX family. CONCLUSION: Human CNSs as a whole appear to be too broad a class of sequences to possess strong footprints of any short sequence-specific functions. Such footprints should be studied at the level of functional subclasses of CNSs, such as those which flank genes with a particular pattern of expression. Overall properties of CNSs are affected by patterns in mutation, suggesting that selection which causes their conservation is not always very strong.


Assuntos
Sequência Conservada , DNA/genética , Humanos
16.
Genome Inform ; 15(2): 276-86, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-15706513

RESUMO

Based on the manual annotation of transcription factors stored in the TRANSFAC database, we developed a library of hidden Markov models (HMM) to represent their DNA-binding domains and used it for a comprehensive classification. The models constructed were applied on the UniProt/Swiss-Prot database, leading to a systematic classification of further DNA-binding protein entries. The HMM library obtained can be used to classify any newly discovered transcription factor according to its DNA-binding domain and, thus, to generate hypotheses about its DNA-binding specificity.


Assuntos
Proteínas de Ligação a DNA/química , Proteínas de Ligação a DNA/classificação , Genoma , Fatores de Transcrição , Sítios de Ligação , Biologia Computacional , Bases de Dados Factuais , Bases de Dados de Proteínas , Sequências Hélice-Volta-Hélice , Elementos de Resposta , Alinhamento de Sequência , Análise de Sequência de Proteína , Proteínas com Domínio T , Fatores de Transcrição/química , Fatores de Transcrição/classificação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...